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Summary. Hypoelliptic diffusion processes can be used to model a variety of phenomena 
in applications ranging from molecular dynamics to audio signal analysis. We study param- 
eter estimation for such processes in situations where we observe some components of the 
solution at discrete times. Since exact likelihoods for the transition densities are typically not 
known, approximations are used that are expected to work well in the limit of small inter-sample 
times At and large total observation times TV At. Hypoellipticity together with partial observa- 
tion leads to ill-conditioning requiring a judicious combination of approximate likelihoods for the 
various parameters to be estimated. We combine these in a deterministic scan Gibbs sampler 
alternating between missing data in the unobserved solution components, and parameters. 
Numerical experiments illustrate asymptotic consistency of the method when applied to simu- 
lated data. The paper concludes with application of the Gibbs sampler to molecular dynamics 
data. 



1. Introduction 

In many application areas it is of interest to model some components of a large deterministic 
system by a low dimensional stochastic model. In some of these applications, insight from 
the deterministic problem itself forces structure on the form of the stochastic model, and 
this structure must be reflected in parameter estimation. In this paper, we study the fitting 
of stochastic differential equations (SDEs) to discrete time series data in situations where 
the model is a hypoelliptic diffusion process, meaning that the covariance matrix of the 
noise is degenerate, but the probability densities are smooth, and also where observations 
are only made of variables that are not directly forced by white noise. Such a structure 
arises naturally in a number of applications. 



One ap plication is the m odelling of macro-molecular systems, see lGrubmuller and Tavan 



3P 

(1994) and lHummerl (|2005f ). In its basic form, molecular dynamics describes the molecule by 



a large Hamiltonian system of ordinary differential equations (ODEs). As is commonplace 
in chemistry and physics, we will refer to data obtained from numerical simulation of such 
models as molecular dynamics data. If the molecule spends most of its time in a small 
number of macroscopic configurations then it may be appropriate to model the dynamics 
within, and in some cases between, these states by a hypoelliptic diffusion. While this 
phrasing of t he questio n is re latively recent, under the name of the "Kra mers prob l em" i t 
dates back to Kramers! ( 194dh with a brief summary in section 5.3.6a of Gardiner (1985). 
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Another application, audio signal analysis, is referred to in Giann opoulos and Godsilll (|200ll ) 
where a continuous time ARMA model is used, see also iGodsill and Yang (|2006r ) for more 
on the type of methodology used. 

We consider SDE models of the form 



dx = eA(x)dt + CdB 
x(0) = x 



(1) 



where B is an m-dimensional Wiener process and x a fc-dimensional continuous process with 
k > m. A : R k — ► R' is a set of (possibly non-linear) globally Lipschitz force functions. 
The parameters which we estimate are the last m rows of the drift matrix (the first k — m 
rows of which are assumed to be known), 8 £ R fcx/ , and the diffusivity matrix C which we 
assume to be of the form 



C 





r 



nfeX? 



where T £ R raxra is a constant nonsingular matrix. Thus, we are estimating drift and 
diffusion parameters only in the coordinates which are directly driven by white noise. 

It is known that under suitable hypotheses on A and C, a unique L 2 - integrable solutio n 




x(-) exists almost-surely for all times t £ R + , see e.g. Theorem 5.2.1 in I Oksendal 
We also assume that the process defined by ([Tj is hypoelliptic as defined in iNualartl I 
Intuitively, this corresponds to the noise being spread into all components of the system ([1]) 
via the drift. 

The structure of C implies that the noise acts directly only on a subset of the variables 
which we refer to as rough. It may then be transmitted, through the coupling in the drift, 
to the remaining parts of the system which we refer to as smooth (we do not mean C°° 
here, but they are at least C 1 ). To distinguish between rough and smooth variables, we 
introduce the notation x{t) T = (u(t) T ,v(t) T ) where u(t) £ R fc_m is smooth and v(t) £ R m 
is rough. It is helpful to define projections V : R k -> R fe ~ m by Vx = u and Q : R k -> R m 
by Qx — v. 

We denote the sample path at N+l equally spaced points in time by {x n — x(nAt)}^ =Q , 
and we write x„ = to separate the rough and smooth components. Also, for any 

sequence (z\, . . . , Zff), N £ N we write Az„ = z n+ i — z n to denote forward differences. We 
are mainly interested in cases where only the smooth component, u, is observed and our 
focus is on parameter estimation for all of T and for entries of those rows of corresponding 
to the rough path, on the assumption that {u n }n=o are samples from a true solution of (fTj): 
such a parameter estimation problem arises naturally in many applications and an example 
is given in section [7] We will describe a deterministic scan Gibbs sampler to approach this 
problem, sampling alternatingly from the missing path {v n }^ =0 , the drift parameters and 
the covariance IT T . It is natural to consider NAt = T ^> 1 and At <C 1. 

Given prior distributions for the parameters, po(Q,Tr T ), the posterior distribution can 
be constructed as follows: 



\v,e,rr T \ u ) 



(t),e,rr J 



p(u) 

C(u, w|e,rr T ) 



T(u) 



(2) 



Here, C(u,v\Q,TT T ) has been introduced as a measure equal to the probability density 
P(u, v\Q, rr T ) up to a constant of proportionality. When u, v are fixed and C(u, v\Q, TT T ) 
is thought of as a function of and IT T it is a a likelihood. 
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Similarly, the probability densities P(«|9, IT T , u), F(Q\v, IT 7 , u) and P(IT T |«, 8,u) 
are replaced by corresponding expressions using C when omitting constants of proportion- 
ality that are irrelevant to estimation of the posterior probability. The probability density 
P(it, u|0,IT T ) gives rise to the transition density P(u„ + i, v n +i\u n , v n , 9, IT T ) which we 
will write as £(u n+ i, v n +i\u n , v n ,Q, TT T ) when omitting constants of proportionality. 

In principle, ^ can be used as the basis for Bayesian sampling of (0,IT T ), viewing 
v as missing data. However, the exact probability of the path, P(it, v |6,rr T ), is typically 
unavailable. In this paper we will combine judicious approximations of this density to solve 
the sampling problem. 

The sequence {x n }n=o defined above is generated by a Markov chain. The random map 
x n i — ► x n +i is determined by the integral equation 



'-<n+l 



(n+l)At /.(n+l)At 

QA(x(s))ds + / CdB(s). 

At JnAt 



The Euler-Maruyama approximation of this map gives 



X n 



+1 



X n + AtQA(X„ 



/ Aii?(0, 8)£„ 



(3) 



where X n ^ n G R k , £ n is an iid sequence of normally distributed random variables, £<n 
AT (0,1), and 



#(0,8) 





o r 



r,kxk 



is not invertible. (Here, as throughout, we use uppercase letters to denote discrete-time ap- 
proximations of the continuous time process.) This approximation corresponds to retaining 
the terms of order O(At) in the drift and of Q{\fKt) in the noise when performing an Ito- 
Taylor expansion (see chapter 5 of lKloeden and Platen ( 19921 )). Due to the non-invertibility 
of i?(0, 0), this approximation is unsuitable for many purposes and we extend it by adding 
the first non-zero noise terms arising in the first k — m rows of the Ito- Taylor expansion for 
X n+ i. This results in the expression 



X n+1 « X n + AtQA(X n ) + VA~tR(At: 9)£„ 



(4) 



where X n £ R k , G R fc is distributed as W(0, 1) and R(At; 9) G R kxk . Because of the 
hypoellipticity, R(At; 9) is now invertible, but the zeros in C mean that it is highly ill- 
conditioned (or near-degenerate) for < At <C 1. Specific examples for the matrix R will 
be given later. 

Ideally we would like to implement the following deterministic scan Gibbs sampler: 



(a) Sample 9 from P(9|u, v, TT T ). 

(b) Sample IT T from P(IT T |u, v, 9). 

(c) Sample v from F(v\u, 9, rr T ). 

(d) Restart from step (a) unless sufficiently equilibrated. 



In practice, however, approximations to the densities P will be needed. We refer to expres- 
sions of the form ([4]) as models and we will use them to approximate the exact density on 
path-space, F(u, v\Q, IT T ), of the path u,v for parameter values 9 and IT T . The result- 
ing approximations, F E (U, V\Q, TT T ) and F IT (U, V\Q, TT T ) of F(u, v\Q, TT T ), are found 
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from ([3]) and ^ respectively. We again use Ce and Cit in the same way as for the exact 
distribution P above when omitting constants of proportionality. 
The questions we address in this paper are: 

i How does the ill-conditioning of the Markov chain {x n }^ =0 affect parameter estima- 
tion for rr T and for the last m rows of 8 in the regime At < 1, NAt = T ^> 1 



ii In many applications, it is natural that only the smooth data {u n }„ =Q i s observed, 
and not the rough data {v n }n=o- What effect does the absence of observations of the 
rough data have on the estimation for At <C 1 and N At = T ^> 1? 

iii The exact likelihood is usually not available; what approximations of the likelihood 
should be used, in view of the ill-conditioning? 

iv How should the answers to these questions be combined to produce an effective Gibbs 
loop to sample the distribution of parameters 0, IT T and the missing data {u n }^_ ? 

To tackle these issues, we use a combination of analysis and numerical simulation, based 
on three model problems which are conceived to highlight issues central to the questions 
above. We will use analysis to explain why some seemingly reasonable methods fail, and 
simulation will be used both to extend the validity of the analysis and to illustrate good 
behaviour of the new method we introduce. 

For the numerical simulations, we will use either exact discrete time samples of ([T]) in 
simple Gaussian cases, or trajectories obtained by Euler-Maruyama simulation of the SDE 
on a temporal grid with a spacing considerably finer than the observation time interval At. 

In section 2 we will introduce our three model problems and in section 3 we study the 
performance of Ce to estimate the diffusion coefficient. Observing and analysing its failure 
in the case with partial observation leads to the improved statistical model yielding Cit 
which eliminates these problems; we introduce this in section 4. In section 5 we show 
that Lit is inappropriate for drift estimation, but that Ce is effective in this context. In 
section 6, the individual estimators will be combined into a Gibbs sampler to solve the 
overall estimation problem with asymptotically consistent performance being demonstrated 
numerically. Section 7 contains an application to molecular dynamics and section 8 provides 
concluding discussion. 

We introduce one item of notation to simplify the presentation. Given an invertible 
matrix R £ ffi. Ttxn we introduce a new norm using the Euclidean norm on M™ by setting 
\\x\\ R = \\R~ x x\\ 2 for vectors x € M™. 

1.1. Two classical estimators 

From previous work on hypoelliptic diffusions, we note a classical estimator for the covari- 
ance matrix and for the drift matrix in the linear fully observed case which will be useful 
for reference later in the paper. 

Firstly, it is straightforward to estimate the covariance matrix rr T from the quadratic 
variation: noting that 



? 




N-l 



as N 



CO, 



(5) 



n=0 



with T = NAt fixed, see lDurrettl (|l996f ). 
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The Girsanov formula gives rise to a maximum likelihood estimator for the lower rows 
of 0, and in the linear case, where A is just the identity, the maximum likelihood estimate 
for the whole of O is given by 

,t ,T 

e = [/ dxx T ][ xx T dt]- 1 . (6) 



For th e hypoelliptic case, this is proved to be consistent as T — > oo in Breton and Musielal 
( 19851) . 



2. Model Problems 

To study the performance of parameter estimators, we have selected a sequence of three 
Model Problems ranging from simple linear stochastic growth through a linear oscillator 
subject to noise and damping to a nonlinear oscillator of similar form. All these problems 
are second order hypoelliptic and they have a physical background, so we use q (position) 
and p (momentum) to denote smooth and rough components in the Model problems instead 
of u and v which we used in the general case. Their general form is given as the second 
order Langevin equation 

dq = pdt, , , 

dp = (-7P+ f{q;D))dt + adB { ' ' 

where / is some (possibly nonlinear) force-function parametrised by D and the variables q 
and p are scalar. The parameters 7, D and a are to be estimated. 



2. 1. Model Problem I: Stochastic Growth 

Here, x — (q,p) T satisfies 

dq = pdt 



dp 



crdB. 



(8) 



The process has one parameter, the diffusion parameter <j, that describes the size of the 
fluctuations. In the setting of (p} we have 



A(x) = x 



e = 



1 




c = 



and u = q, v = p. The process is Gaussian with mean and covariancc 



d 


t 




<7o 





1 




ro 



The exact discrete samples may be written as 



and E(t) = a 1 



t 3 /3 t 2 /2 
t 2 /2 t 



9n+l 
Pn+1 



A . . (At) 3 / 2 .! 1 ) , (At) 3 ' 2 >(2) 
p n At + a K Qn + C" , 



(9) 



with Co ~ A/"(0, 



1 
1 



and {C«}„ = o being i.i.d.; individual components of Cn are referred 



to as and respectively. The matrix R from Q is given here as 



R = a 



1 
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In the case of this model problem, the auxiliary model is actually exact. 



2.2. Model Problem II: Harmonic Oscillator 

As our second model problem we consider a damped harmonic oscillator driven by a white 
noise forcing where x = (q,p) T : 



dq = pdt 

dp = —Dqdt — rypdt + crdB. 
This model is obtained from the general SDE (fTJ) for the choice 



(10) 



a (x) = x, e = 





-D 



-7 



c = 



and u — q, v — p. The process is Gaussian and the mean and covariance of the solution can 
be explicitly calculated. The matrix R is the same as in Model Problem I. 



2.3. Model Problem III: Oscillator with Trigonometric Potential 

In the third model problem, x = (q,p) T describes the dynamics of a particle moving in a 
potential which is a superposition of trigonometr ic functions and in contact with a heat bath 
obeying the fluctuation-dissipation relation, see lLasota and Mackevl (jl994 ). This potential 
is sometimes used in molecular dynamics in connection with the dynamics of dihedral angles 
- see section [7] The model is 



dq 

dp 



pdt, 

{-IP - Zw=i Dj sm (<7) cos^~ 1 (q))dt + odB. 



(11) 



This equation has parameters 7, -D,, i = 1, . 
SDE © for the choice 



, c and a. It can be obtained from the general 



.4 



sin(g) 
sm(q)cos(q) 

sin(<7) cos c ^ 1 (q) 
P 



e = 





-Dt 



1 

D c -7 



C = 



and u = q, v = p. No explicit closed-form expression for the solution of the SDE is known 
in this case; the process is not Gaussian. The matrix R in the statistical model ((4J is the 
same as the one obtained in Model Problem I. 



3. Euler Auxiliary Model 

As discussed in the introduction, we need to find appropriate approximations for P in steps 
(a)-(c) of the desired Gibbs loop. The purpose of this section is to show that use of P_g in 
step (c) , to sample the missing component of the path, leads to incorrect estimation of the 
diffusion coefficient. The root cause is the numerical differentiation for the missing path 
which is implied by the Euler approximation. 
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3. 1 . Auxiliary Model 

If the force function A(-) is nonlinear, closed-form expressions for the transition density are 
in general unavailable. To overcome this obstacle, one can use a discrete time auxiliary 
model. The Euler model is commonly used and we apply it to a simple linear model 
problem to highlight its deficiencies in the case of partially observed data from hypoelliptic 
diffusions. 

The Euler-Maruyama approximation of the SDE fl} is 

X n+1 = X n + AtOA(X n ) + VAlCCn (12) 

where £ n ~ A/"(0, 1) is an i.i.d. sequence of m-dimensional vectors with standard normal 
distribution. This corresponds to 0$ with R(At; 0) replaced by i?(0; 0) from ([3]). Thus we 
obtain 

f u n+1 = u n + AtreA(x n ) _ 

\ V n+1 = V n + AtQQA(X n ) + VMT£, 

where now each element of the i.i.d. sequence £ n is distributed as JV(0,I) in R m . This 
model gives rise to the following density: 

c ND (u,v\e,rr T ) = 

lln=0 V 2 -l rrT l \~^' t 1 " ] i 

The Dirac mass insists that the data is compatible with the auxiliary model (|12|) . i.e. the 
V path must be given by numerical differentiation (ND) of the U path in the case of 0, 
and similar formulae in the general case. To estimate parameters we will use the following 
expression: 

^(^I9,rr^n^^^^ , (is) 

In the case when the Euler model is used to estimate missing components we assume that 
{U n }, {Vn} are related so that the data is compatible with the auxiliary model - that is, 
numerical differentiation is used to find {V n } from {U n }. 




3.2. Model Problem I 

The Euler auxiliary model for this model problem is 

{Qn + l = Qn + PnAt, 
Pn+1 =Pn +<jVAit n . 

Here, is an i.i.d. Af(0, 1) sequence. The root cause of the phenomena that we discuss 
in this paper is manifest in comparing ((9|) and (fl6|) . The difference is that the 0((Ai) 3 / 2 ) 
white noise contributions in the exact time series © do not appear in the equation for Q n . 
We will see that this plays havoc with parameter estimation, even though the Euler method 
is path-wise convergent. 

We assume that observations of the smooth component only, Q n , are available. In this 
case the Euler method for estimation (fT6|) gives the formula 



71 ~ At { ' 
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300 
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Complete Observation 
T=100, At=0.1 



Complete Observation 
T=10, At=0.1 



400r 
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200 
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400r 
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|H<o>=0. 99921 
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Partial Observation 
T=100, At=0.1 



Partial Observation 
T=10, At=0.1 



300 



|<a>=0.81607 



l«j>=0. 80636 



i 300 J 
200 II 



300 



Partial Observation 
T=10, At=0.01 



B<g>=0.81491 

jj 



1.5 0.75 1 1.25 1.5 0.5 0.75 1 1.25 1.5 0.5 0.75 1 1.25 1.5 



Fig. 1. Maximum likelihood estimates of a using Euler Model for 
Model Problem I. 

Top row: fully observed process; bottom row: partially observed 
process. 



for the missing data. In the following numerical experiment we generate exact data from 
© using the parameter value a = 1. We substitute P n given by (fTT)) into P^|) and find 
the maximum likelihood estimator for a in the case of partial observation. In the case of 
complete observation we use the exact value for {Pn}, from ([9]), and again use a maximum 
likelihood estimator for a from (fT5| . 

Using N = 100 timesteps for a final time of T = 10 with a = 1 the histograms for the 
estimated diffusion coefficient presented in the middle column of Figure 13.21 are obtained. 
The top row contains histograms obtained in the case of complete observation where good 
agreement between the true a and the estimates is observed. The bottom row contains 
the histograms obtained for partial observation using (fT7|l . The observed mean value of 
E<7 = 0.806 indicates that the method yields biased estimates. Increasing the final time to 
T = 100 (see left column of graphs in Figure ET2")) or increasing the resolution to At = 0.01 
(see right column of graphs in Figure |3~2|) do not remove this bias. 

Thus we see that, in the case of partial observation, a contains 0(1) errors which do not 
diminish with decreasing At and/or increasing T = N At. 



3.3. Analysis of why the missing data method fails 

Model Problem I can be used to illustrate why this method fails. We first argue that the 
method works without hidden data. Interpreting (fT5j) as a log-likelihood function wrt. a, 
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1 N ^ 



we obtain following expression in the case of stochastic growth: 

N-l 

log£ E (a\Q,P) = -2Nloga ' 

a z At 

n— u 

where A is the forward difference operator. The maximum of the log-likelihood function 
gives the maximum likelihood estimate, 

JV-l 



NAt 



In the case of complete data, (|9]) gives 



71=0 



JV-l 



- 2 = ^E(ci 2) ) 2 



(18) 



(19) 



By the law of large numbers, a 2 — > a 1 almost surely as N — > oo. This shows that the 
method works when the complete data is observed. 

Let us consider what happens when P is hidden. In this case, P n is estimated by 

Qn+i — Q 



Pn = 

But since q n is generated by we find that 

Pn+l 



At 



P, 



12 



and 



AP n = 



AP 



71+1 



2 

as/Al 



AP, 



a(2) , M-A) , 

Sn+1 ' Sri i /77S71+I 



I Sn+1 Sn 



! + 

When AP n is inserted in (fl"8"|) it follows that 

2 iV-1 / 



"(2 



4/V 
4/V 



K2) 



V3 



Ai) _A 1 )' 

Sjl+l S7l 



71=0 \ 

JV-l 



E Ci 



71 = 

JV-l 



Sn+1 



JV-l 

^E(c 



(2) 



71=0 



A*) 
Sn+1 



Sn+1 



The random variables {Cn}„=o are with £o ~ ^(0, /). So, by the law of large numbers, 



2 „2 



a almost surely as N — > oo. Furthermore, the limits hold in either of the cases 
where either NAt = T or At are fixed as N — + oo. This means that independently of what 
limit is considered, a seemingly reasonable estimation scheme based on Euler approxima- 
ti on results in O(l) errors in the diffusion coefficient. There is similarity here with work 
of Gaines and Lvonsl (|l997t l showing that adaptive methods for SDEs get the quadratic 
variation wrong if the adaptive strategy is not chosen carefully. 
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4. Improved Auxiliary Model 



The Euler auxiliary model fails to propagate noise to the smooth component of the solution 
and thus leads to estimating missing paths v with incorrect quadratic variation. A new 
auxiliary model is thus proposed which propagates the noise using what amounts to an 
Ito- Taylor expansion, retaining the leading order component of the noise in each row of the 
equation. The model is used to set up an estimator for the missing path using a Langevin 
sampler from path-space which is then simplified to a direct sampler in the Gaussian case. 
Numerical experiments indicate that the method yields the correct quadratic variation for 
the simulated missing path. 

The model is motivated using our common framework the Model Problems I, II and 
III, namely ((7]). The improved auxiliary model is based on the observation that in the 
second row of an Ito- Taylor expansion of the drift terms are of size O(At) whereas the 
random forcing term is "typically" (in root mean square) of size C(V At). Thus, neglecting 
the contribution of the drift term in the second row on the first row leads to the following 
approximation of |J7J| : 



Qn+l 




Qn 


+ At 


Pn+1. 







Pn 

f(Qn) ~ jPn 



r(n+l)At 
JnAt 

B({n H 



(B(s) - B(nAt))ds 
- l)At) - B{nAt) 



The random vector on the right hand side is Gaussian, and can be expressed as a linear 
combination of two independent normally distributed Gaussian random variables. Com- 
putation of the variances and the correlation is straightforward leading to the following 
statistical model: 



Qn+l 




Qn 


+ At 


_Pn+l_ 







Pn 

/(Qn) " jPn 



crVAtR 



(20) 



Here, £i and £2 are independent normally distributed Gaussian random variables and R is 
given as 



R = 



At At 
755 2 

1 



This is a specific instance of ((H). It should be noted that this model is in agreement 
with the Ito- Taylor approximation up to error terms of order 0(At 2 ) in the first row and 
O(Ati) in the second row and that higher order hypoelliptic processes can be approximated 
using a similarly truncated Ito- Taylor expansion. The key important idea is to propagate 
noise into all components of the system, to leading order. 

If complete observations are available, this model performs satisfactorily for the esti- 
mation of a. This can be verified analytically for Model Problem I in the same fashion 
as in section 13.31 Numerically, this can be seen from the first row (referring to complete 
observation) of Figure [2] for Model Problem I and from the first row of Figure [3] for Model 
Problem II. In both cases the true value is given by a = 1. See subsection 4.2 for a full 
discussion of these numerical experiments. 

If only partial observations are available, however, a means of reconstructing the hidden 
component of the path must be procur e d. A standar d procedure would be the use of the 
Kalman filter/smoother ( Kalmann 1960h . Catlinl ( 1989ft) which could then be combined with 
the expectation- maximisation (EM) algorithm ([Dempster et al.l (jl977l ) )Meng and van Dvk 
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(Il997l) ) i;o estimate parameters. In this paper, however, we employ a Bayesian approach 
sampling directly from the posterior distribution for the rough component, p, without fac- 
torising the sampling into forward and backward sweeps. 



4.1. Path Sampling 

The logarithm of the density on path space for the missing data induced by the auxiliary 
model (@| can be written as follows: 



1 N 

log£ IT (p\q, 6, rr T ) = -- ^ \\AX t - QA{Xi)Atf R + const. (21) 



i=o 



We will apply this in the case ([2H)l which is a specific instance of 

One way to sample from the density on pa th space, £rr(P), for rough paths {Pi}fL 



via the Langevin equation (see section 6.5.2 in Robert and Casella ( 19991) ) and, in general, 



we expect this to be effective in view of the high dimensionality of P. Other MCMC 
approaches may also be used. 

However, when the joint distribution of {Pi]f = i is Gaussian it is possible to generate 
independent samples as follows: note first, that in the Gaussian case, when Cit in (|21j) is 
quadratic in P, the derivative of log C tt with r espect to the rough path P can be computed 



explicitly, which is carried out in I Poker nl (|2007| ). For our oscillator framework, the derivative 



can be expressed using a tridiagonal, negative definite matrix P ma t with highest order stencil 
— 1 — 4 — 1 acting on the P-vector plus a possibly nonlinear contribution Q(Q) acting on 
the Q-vector only: 

V p log C IT (Q,P) = PmatP + Q(Q). 

Then, the suggested direct sampler for P-paths is simply: 

p = -p-^Q(Q) + u- 1 z (22) 

Here U T U = — P ma t is a Cholesky factorisation and £ is a dimension N vector of iid normally 
distributed random numbers. 



4.2. Estimating Diffusion Coefficient and Missing Path 

The approximation £jt(P, O) can be used to estimate both the missing path p and 
the diffusion coefficient a for our Model Problems I, II and III. 
In order to estimate <r, the derivative of the logarithm of Cit 

log£ /T (cr|P, Q, O) = log£/ T (P, Q\a, O) + logp (O, a) + const 

(where priors po(0, er) are assumed to be given and constants in a have been omitted) with 
respect to a is computed: 

d 1 d 

— log£ /T = + —Z+—\og(p (e,a)). 

Oct a a° oa 

Here, we have used the abbreviation 

jV-l 

* == E 

p=0 



Qp+i 
Pi+i 



® p I - At ( Pp 

Pp J \ -f{Qn) - jPp 
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In this case no prior distribution was felt necessary as, when N — ► oo, its importance 
would diminish rapidly. Thus we set po = 1. 

We use a Langevin type sampler for this distribution. In order to avoid the singularity at 
(7 = we use the transformation £(<r) = <r 4 . Using the ltd formula, this yields the following 
Langevin equation which we use to sample £ and hence a: 

dC = ((12-8N)^/C + 4z)ds + 4V2C^dW. (23) 

A simple explicit Euler-Maruyama discretisation in s is used to simulate paths for this 
SDE. The timestep As needs to be tuned with N to ensure convergence of the explicit 
integrator. Since this is a one-dimensional problem, conservatively small timesteps and 
long integration times can be aff orded. With this choice of ti mestep As the theoretically 
possible transient behaviour (see Roberts and Tweedie (1997)) was not observed and we 



expect accurate samples from the posterior in a. 

This Langevin-type sampler (I23J ) can t hen be alternated in a Systematic Scan Gibbs 
Sampler (as described on p. 130 oflLiul (|2nnih ) using -/Vcibbs iterations with the direct sampler 



for the paths, (|22p . This yields estimates of the missing path and the diffusion coefficient 
which is estimated by averaging over the latter half of the A<Gibbs samples. We illustrate 
this with an example using Model Problem I with the following parameters: 

a=\, T e {10, 100}, At G {0.1, 0.01}, iV Gibbs = 50. 

The sample paths used for the fitting are generated using a sub-sampled Euler-Maruyama 
method with temporal grid A* where k = 30. The resulting histogram of mean posterior 
estimators is given in Figure [2] where the first row corresponds to the behaviour when 
complete observations are available and the second row corresponds to only the smooth 
component being observed and missing data being sampled according to (|22|) . For Model 
Problem II we use the following parameters: 

cr = l, D = 4, 7 = 0.5, 

Te{10,100}, At e {0.02,0.002}, 7V Gibbs = 50. 

The sample paths used for the fitting are generated as for Model Problem I and the exper- 
imental results are given in Figure [3l 

It appears from these figures that the estimator for this joint problem performs well for 
Model Problems I and II for At sufficiently small and T sufficiently large. A more careful 
investigation of the convergence properties is postponed to section 6 when drift estimation 
will be incorporated in the procedure. 



5. Drift Estimation 

5. 7. Overview 

With the approximations Ce and Cjt in place, the question arises which of these should be 
used to estimate the drift parameters. Using Model Problem II we numerically observe that 
an Ce based maximum likelihood estimator performs well. In contrast, ill-conditioning due 
to hypoellipticity leads to error amplification and affects the performance of the Cit based 
maximum likelihood estimator. 
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Fig. 2. Estimates of a using the Lit Model for Model Problem I. 

Top row: MLEs for fully observed process; 

bottom row: Mean Gibbs estimates for partially observed process. 




Fig. 3. Estimates of a using the Lit Model for Model Problem II. 

Top row: MLEs for fully observed process; 

bottom row: Mean Gibbs estimates for partially observed process. 
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5.2. Drift Parameters from C E 

In order to simplify analysis, we illustrate the estimator using Model Problems II, (llOp and 
III, (TTT1) . For the latter, the Euler auxiliary model is given as follows: 

Qn+l = Qn + AtP n 

where we abbreviated the trigonometric expressions using fj(q) = sin(q) cos J (g). The 
functional Ce in this case is given by: 

C E {l,D\Q,P,a) oc expl-^ n=0 ^ ^Kt^ H ■ ( 25 ) 

Clearly, this posterior is Gaussian with distribution 

Q^N(M E 1 b E ,M E 1 ), (26) 
where the matrix Me and the vector bs can be read off from (|25l) . 



5.3. Drift Parameters from Lit 

As the approximate model based on Cjt is observed to resolve the difficulty with estimating 
a for hidden p-paths, it is interesting to see whether it can also be used to estimate the 
drift parameters. 

The logarithm of the density on path space up to an additive constant is given by (|21[) . 
To illustrate the problems arising from the use of Lit we use Model Problem II, so that 
(22) becomes 



JV-l 



log C IT (e\Q,P, a) = ]T \\(AX n -AtGA(X n ))f R + const 



(27) 



where R = a 



At At 

1 



irrelevant constants have been omitted and we have 



" Qn ' 


) = 


Qn 


II 


1 






Pn _ 


-D -7 



.4 



In order to obtain a maximum likelihood estimator from this, we take the derivative with 
respect to the parameters D and 7 and equate to zero. This yields the following linear 
system: 



£„<#At EnPnQnAt 
E„ PnQnAt £„P„ 2 At _ 



J2 n QnAP n 



E 
E 



n2™l At r » 

3p / AQ n _ p 
n 2 r " \ At r ' n 



28) 



Comparing this linear system to the mean of the successful estimator (f26|) we note the 
presence of an additional term on the right hand side. This term leads to the failure of the 
above estimator. Thus, Cjt is not an appropriate approximation for use in step (a) of the 
Gibbs sampler. 
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T=1000, At=0.01 T=100, At=0.01 T=100, A t=0.001 




Fig. 4. Maximum likelihood drift estimates for Model Problem II, using Hit 

5.4. Numerical Check: Drift 

There are two factors influencing convergence: T and At. To illustrate their influence, 
consider the following series of numerical tests. All of the tests share these parameters: 

D = 4 7 = 0.5 cr = 0.5 A: = 30 

Data for the tests is again generated using an Euler-Maruyama method on a finer temporal 
grid with resolution At/k. In the plot given in Figure [4] the top row contains histograms for 
the maximum likelihood estimate for the drift parameter D whereas the second row contains 
histograms for the drift parameter 7 in any case using the full sample path for maximum 
likelihood inference, i.e. formula (|28|) . It is clear from these experiments summarised in 
Figure |H that both D and 7 are grossly underestimated by D,*f from ((25J). This problem 
does not resolve for smaller At (see the right column of that figure) ; it does not disappear 
for longer intervals of observation, either, as can be inferred from the left column of Figure 

5.5. Why the C IT Model Fails for the Drift Parameters 

The key is to compare (|28p with the mean in (|2"rj]) . This reveals that the last term in l|28p 
is an error term which we now study. 

Using the 2nd order Ito- Taylor approximation 

+ ^At 2 A 2 X n + 0{At^) 



X n 



+1 



X n + AtAX n 



1 

-7 1 



R 



6 
6 
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we can compute the second term on the right hand side of 



Tin 2*2" 



2 V 

Z^in 2 rn \ At 



( AQ„ _ p 
I At r n 



P, 



I 3 + 0(At). (29) 



Here, D and 7 refer to the exact drift parameters used to generate the sample path, whereas 
D and 7 in (|28p and (|29p are the drift parameters estimated using the improved auxiliary 
model. The term I s on the right hand side contains stochastic integrals whose expected 
value is zero. 

As the mean error terms can be written in terms of the matrix elements themselves, 
(|29|) can be substituted in (|28|) to obtain: 



ED 

E7 



D + O(At) 
7 + 0(Ai). 



(30) 
(31) 



This seems to be corroborated by the numerical tests. 



5.6. Conclusion for Drift Estimation 

We observed numerically but do not show here that Ce associated with an Euler model for 
the SDE |T]) yields asymptotically consistent Langevin and maximum likelihood estimators 
for Model Problem II. 

While it is aesthetically desirable to base the estimation of all parameters as well as 
the missing data on the same approximation Cit of the true density (up to multiplicative 
constants) C, and although this approximation was found to work well for the estimation 
of missing data and the diffusion coefficient, it does not work for the drift parameters. 

It is possible to trace this failure to the fact that only the second row of Q is estimated 
where O(At) errors in the first row get amplified to 0(1) errors in the second row. Estimat- 
ing all entries of 0, while being outside the specification of the problem under consideration, 
also yields 0(1) errors if Cit is used and so does not remedy the problem. This problem is 
not shared by the discretised version of the diffusion independent estimator ([6]) , but this is 
not a maximum likelihood estimator for Cit- 

In summary, for the purposes of fitting our model problems to observed data we employ 
the Euler auxiliary model (I25| for the drift parameters. 

6. The Gibbs Loop 

In this section, we combine the insights obtained in previous sections to formulate an effec- 
tive algorithm to fit hypoelliptic diffusions to partial observations of data at discrete times. 
We apply a deterministic scan Gibbs sampler alternating between missing data (the rough 
component of the path, v), drift parameters and diffusion parameters. 

We combine the approximations developed and motivated in previous sections in the 
following Gibbs sampler: 



(a) Sample 9 from P E {e\U,V,a). 

(b) Sample a from F IT {a\U, V,0). 
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(c) Sample V from F IT {V\U, 6, o). 

(d) Restart from step (a) unless sufficiently equilibrated. 



Our numerical results will show that this judicious combination of approximations results 
in an effective algorithm. Theoretical justification remains an interesting open problem. 
When applied to Model Problem III the detailed algorithm reads as follows: 

Algorithm 1. Given observations Qi, i = 1, . . . , N, the initial P-path is obtained using 
numerical differentiation: 

pio) = (32) 

The initial drift parameter estimate is just set to zero: {DfH = 0, = 0. Then start 
the Gibbs loop: 



For k = l,..., iVcibbs- 

(a) Estimate the drift parameters 7^') and {Dj}j =1 using sampling based on Ce given 
{ p i k ~ 1] }._ ma ©• 

(b) Estimate the diffusivity er( fc ) us ing the Langevin sampler (123|) based on Lit given 

(c) Get an independent sample of the P-path, < pj- k ' > using (|22p derived from Lit 

^ I J i—0 

given parameters j( k \ |_D^| and 



We test this algorithm numerically where sample paths of (jTTJ) are generated using a 
sub-sampled Euler-Maruyama approximation of the SDE. The data is generated using a 
timestep that is smaller than the observation time step by a factor of either k = 30 or 
k = 60. Comparing the results for these two and other non-reported cases, they are found 
not to depend on the rate of subsampling, k, if this is chosen large enough. The parameters 
used for these simulations are as follows: 



D = l, Di = -8, £> 2 = 8, 7 = 0.5, cr = 0.7, 

T = 500, Ate {!,..., i£g}, ^ G ibb s = 50. 

The trigonometric potential resulting from this choice of drift parameters is depicted on the 
left of Figure [5] and a typical samplepath is given on the right side of Figure [3 It should be 
noted that all sample paths are started at (q,p) — (1, 1). A typical sample path for q given 
in Figured] 

The performance of the Gibbs sampler for the sample g-path given in Figure [5] is shown 
in Figure [7] where 100 Gibbs steps sampling from the posterior distribution of drift and 
diffusion parameters are shown for the setup shown above except that here iVcibbs = 100 
and At = 0.01. Mean posterior estimators are computed averaging over the latter half of 
-^Gibbs iterations as before. This sampling is repeated up to 64000 times and we label the 
repeated-sampling average of these mean posterior estimators as (Dj) and (7). We then 
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Trigonometric Potential Typical q-path 




Fig. 5. Typical sample path for Model Problem III, T = 500 




log2(A t] 
Drift parameter D 





Fig. 6. Model Problem III, T = 500 : Displaying Averaged Mean Posterior Deviations of the Drift 
Parameters 
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Fig. 7. Model Problem III: Burn-in of Gibbs Sampler 

compute their deviation from the true values, ADi = (Di) — Di and plot ADi and A7 
versus At in a doubly logarithmic plot given in Figure [6l 

We seek to fit a straight line to the ADi in a doubly logarithmic plot to ascertain the 
order of convergence. Since a standard least squares fit proves inadequate, we employ the 
following procedure: 

Given averaged numerically observed parameter estimates yi and their numerically ob- 
served Monte Carlo standard deviations en obtained at timesteps Ati we fit b and c in the 
following linear regression: 

otiii =Vi — b— cAU. (33) 

Assuming that the errors are normally distributed (which is empirically found to be the 
case) a maximum likelihood fit for the parameters b and c can be performed and yields 
the asymptotic (for At — > 0) drift parameter values reported in Figure [6l Note that this 
fit constrains the slope of the fitted line in the doubly logarithmic plot to one. This is to 
minimise the number of parameters fitted and to improve the accuracy of the extrapolated 
value b which is the predicted value for y at At = 0. It can be observed in Figure [6] 
that this leads to good agreement with the observed average parameter values yi, and this 
corroborates the estimator's bias being of order O(At). 

Comparing the results for the two final times tested, T = 50 and T = 500, we find that 
the deviation of the asymptotic drift parameter (b in (|33[) ) from the true parameter value 
is consistent with it being O (i). This error is attributed to all sample paths having been 
started at (q,p) = (1, 1) rather than from a point sampled from the equilibrium measure. 

For the diffusion parameter <r, results analogous to those in Figure using the same 
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Diffusion parameter a - convergence 




log2(A t) 



Fig. 8. Model Problem III, T = 500 : Displaying Averaged Mean Posterior Deviations for a 



parameter values, are shown in Figure [8] (although that figure displays results for k = 30 
only). Asymptotic consistency can be observed from this figure with a naive least squares 
fit yielding a slope of O(At - 93 ). This is consistent with an O(At) error in the estimated 
diffusion parameter. 

From these considerations it is apparent that the numerical experiments' outcome is 
consistent with an 0(At) + (A) bias, so the Algorithm [T] is numerically observed to be an 
asymptotically unbiased estimator of the drift and diffusion parameters in the cases studied. 



7. Application to Molecular Conformational Dynamics 

As an application of fitting hypoelliptic diffusions using partial observations we consider 
data arising from molecular dynamics simulations of a Butane molecule using a simple heat 
bath approximation. 

By considering the origin of the data we demonstrate that it is natural to fit a hypoelliptic 
diffusion process which yields convergent results for diminishing inter-sample intervals At. 
Also, stabilisation of the fitted force function f(q) — X^=i Djfj(q) as the number of terms 
to be included, c, increases, is observed. Thus the Algorithm [1] is shown to be effective on 
molecular dynamics data, ft is also clear, though, that the resulting fit has only limited 
predictive capabilities as it fails to fit the invariant measure of the data at all well. However, 
this is a modelling issue which is not central to this paper. 



7.1. Molecular Dynamics 

The data used for this fitting example are generated using a molecular dynamics (MD) 
simulation for a single molecule of Butane. In order to avoid explicit computations for 
solvent molecules, several ad hoc approximate algorithms have been developed in molecular 
dynamics. One of the more sweeping approximations that is nonetheless fairly popular, at 
least as long as electrostatic effects of the solvent can be neglected or treated otherwise, 
is Langevin dynamics. Here, the time evolution of the Cartesian coordinates of the four 
extended atoms of Butane (see Figure |9]) is simulated using a damp ed-driven Hamiltonian 
system; details of the force field used can be found in iBrooksl (|f983h . 
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Fig. 9. Sketch of Dihedral Angle 



Butane Samplepath - Langevin Dynamics 



n 4 Histogram of Dihedral Angle 




x 10 




Fig. 10. MD Samplepath Butane. 

Left: First 500ps of sample path, Right: Histogram of whole sample 
path 



From a chemical point of view interest is focused on the dihedral angle omega, which 
is the angle between the two planes in K 3 formed by atoms 1, 2, 3 and atoms 2, 3, 4 respec- 
tively; see the sketch in Figure [9] Conformational change is manifest in this angle, and the 
Cartesian coordinates themselves are of little direct chemical interest. Hence it is natural to 
try and describe the stochastic dynamics of the dihedral angle in a self-contained fashion. 

One MD run is produced using a timestep of At = O.lfs (Throughout this section, we 
use the time unit fe mtosecond abb reviated to fs. Note that lfs = 10~ 15 s.) and a Verlet 
variant (see p. 435 in Schlickl ( 2000h ) covering a total time of T = 4- 10~ 9 s (4 nanoseconds). 
A section of the path of the dihedral angle as a function of time can be seen on the left of 
Figure [TOl the corresponding histogram for the whole of the path is depicted to the right of 
that figure. 

It should be stressed that the Ito process governing the behaviour of the dihedral angle 
uj is not of the form (|11|> . in particular, it will have a non-constant diffusivity a. So, fitting 
to this data tests the robustness of the fitting algorithm in a way that the experiments in 
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previous sections did not. 



7.2. Fitting 

We aim to fit the process from Model Problem III, equation (fTTj) . to a subsampled trajec- 
tory of w(tj) (viewed as the smooth component q) obtained from the molecular dynamics 
simulation described previously. Subsampling is performed because we have a profusion 
of data and because the hypoelliptic diffusion is expected to be a good fit only at some 
timescales. 

The simulation used to obtain the dihedral angle data is such that u>(t) will be a C 1 
function of time assuming a suitable interpretation of the periodicity in w, so it is natural 
to fit a hypoelliptic process of damped-driven Hamiltonian form. 

The physical time-units in seconds are minuscule and do not lead to estimated SDE 
parameters of order one. It transpires that, in order to obtain parameter values of order 
one, re-scaling time so that the final time becomes T — 80000 is a good choice. This rescaling 
is useful in comparing convergence properties with what was observed in Section 6. In order 
to assess consistency, the MD data is subsampled, at timesteps At G {lfs, 2fs, 3fs . . .} in 
physical time units, corresponding to {k0.02}keN in the rescaled time units. Algorithm [T] is 
then run for iVcibbs = 40 outer iterations on each path using a potential ansatz 



fc=l 



which corresponds to the force functions in (|lip setting — k<dk and / = V' ; the values 
. c G {3, 5, 7} are used in the sequel. These periodic ansatz functions are a natural choice 
for dihedral angle potentials, in fact, the dihedral angle potential given in iBrooksl (|l983l) 



is of this form. The obtained drift parameter estimates under subsampling at timestep 
At can be seen from Figure [IT] in the case c = 5. In this figure, the sampling timestep 
At is the abscissa and the drift and diffusion parameter estimates (0i, . . . , 65, 7 and er) 
obtained from fitting to the samplepath subsampled at timestep At is shown as the ordinate. 
This plot shows the behaviour of the drift and diffusion parameter estimates averaged over 
-^Gibbs = 100 Monte-Carlo samples 9\, . . . , #5,7 for different values of the subsampling rate. 
The behaviour as k — > indicates that the fitted parameter values converge to a well-defined 
limit; a in particular varies relatively little over a large range of subsampling rates. This 
suggests that the proposed algorithm is able to fit Model Problem III to molecular dynamics 
data. The fact that different (especially drift) parameter values are obtained at different 
subsampling rates indicates limitations in the fit to Model Problem III and this will be 
addressed in the next subsection. 



Parameter Estimation for Partially Observed Hypoelliptic Diffusions 23 




Fig. 11. Convergence for fitted MD path with subsampling 

Displays mean Gibbs estimates of drift and diffusion parameters as a function of subsampling interval 

At 



7.3. Limitations 

The desirable convergence properties of the algorithm in At and T should not be confused 
with inference about whether fitting this kind of model to this kind of MD data gives a 
good or a bad fit, it merely indicates that, using the algorithm suggested in this paper, it 
is possible to perform such fitting. 

To show limitations of the model in this particular application and see how the perfor- 
mance can be assessed using the fitting algorithm[T] we show posterior invariant probability 
densities resulting from the fitted trigonometric potentials. In order to do this, we convert 
the posterior drift parameter samples {.Dj }| =1 obtained at step m using input data sub- 
sampled at rate k = 1 to an invariant density g^ 71 ' specified by its values on an equidistant 
grid on the interval [— tt, tt]. These densities for m 6 {1, . . . , 1000} are then averaged and 
their standard deviation is computed point-wise on the grid. This results in the plot given 
in Figure 1121 There, we display results for three orders of trigonometric potential c to be 
fitted. These are contrasted with the empirically observed invariant density and the den- 
sity arising from the classic canonical thermodynamic ensemble which is proportional to 
cxp v j^} which are given in the plot at the bottom of Figure [T2"l For the force field 

used in the molecula r dyna mics simulation, it is known that the latter two agree in the limit 
T -> oo, see lFischerl (|l997| ). 

It should be stressed that in each of these experiments, convergence diagnostics indicate 
convergence of the Gibbs sampler and the posterior distributions for the drift and diffusion 
parameters are very concentrated and hence posterior variances both for the drift and 
diffusion parameters as well as the induced invariant densities are low. 
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Fig. 12. PDFs resulting from fitted potentials for different orders of trigonometric potential - Shaded 
regions display posterior variance 



With increasing polynomial order c we find some qualitative change in the resulting 
invariant density and also (in particular moving from c = 5 to c = 7) a marked increase in 
posterior variance. This goes hand-in-hand with a marked increase in the condition number 
of the drift parameter matrix Me in (|26p. It is simply an ill-conditioned problem to derive 
higher and higher order polynomial coefficients from a fixed length of observed path. 

It is observed that even though the empirically observed invariant density is smooth 
and close to the thermodynamical expectation, the fitted potentials induce an SDE whose 
invariant measure is not a good approximation of the empirical density. This may simply 
be attributed to the fact that the SDE that is being fitted does not represent a good model 
of the dynamics of the dihedral angle in the Butane molecule with second order Langevin 
heat bath model. 



8. Conclusions 

A hybrid algorithm for fitting drift and diffusion parameters of a hypoclliptic diffusion 
process, with constant diffusivity, from observation of smooth data at discrete times has 
been described. The method combines a Gibbs sampler together with differing approximate 
likelihoods employed in different steps of the Gibbs loop. Its performance has been validated 
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numerically for a number of test cases and an application to molecular dynamics data has 
been given. While parameter fitting can be viewed as an inverse problem for SDE solvers - 
and thus ill-conditioning of some kind is always to be expected - a detailed understanding 
of the particular ill-conditioning induced by hypoellipticity and partial observation has been 
attained. 

While only second order hypoelliptic problems have been treated in this article, the 
algorithm's applicability is expected to encompass order k hypoelliptic problems and it has 
been tested successfully on a third order example. Furthermore, non-linear p-dependence 
in the example ([7]) can be dealt with using a Langevin sampler for the missing path and 
this has also been tested. Additionally, observations that are not exactly equispaced can 
also be processed provided the maximal inter-sample time is sufficiently small. 

Further avenues of investigation include the use of imputed data-points between sam- 
ples to diminish O(At) errors; however there is a risk of bad mixing as <j is determined 
by the small scale behaviour of the process which would then be dominated by the im- 
puted data points. This h as been analysed in the case of elliptic diffusion processes in 
iRoberts and Stramei ( 2001 ) an d an application o f stand ard estimators to this problem in 
the hypoelliptic case is given in Godsill and Yangl ( 20061 ). 

Also, an extension to position dependent diffusion coefficients may prove useful, in par- 
ticular, in may render t he algorithm more useful in molecular dynamics contexts such as 
those in iHummer ( 2005 ). 
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